Toward a Totally Unsupervised, Language-Independent Method for the Syllabification of Written Texts

نویسنده

  • Thomas Mayer
چکیده

Unsupervised algorithms for the induction of linguistic knowledge should at best require as few basic assumptions as possible and at the same time in principle yield good results for any language. However, most of the time such algorithms are only tested on a few (closely related) languages. In this paper, an approach is presented that takes into account typological knowledge in order to induce syllabic divisions in a fully automatic manner based on reasonably-sized written texts. Our approach is able to account for syllable structures of languages where other approaches would fail, thereby raising the question whether computational methods can really be claimed to be language-universal when they are not tested on the variety of structures that are found in the languages of the world.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Spell-Checking based on Syllabification and Character-level Graphs for a Peruvian Agglutinative Language

There are several native languages in Peru which are mostly agglutinative. These languages are transmitted from generation to generation mainly in oral form, causing different forms of writing across different communities. For this reason, there are recent efforts to standardize the spelling in the written texts, and it would be beneficial to support these tasks with an automatic tool such as a...

متن کامل

Move-based investigation of appraisal in the introduction section of Applied Linguistics research articles: Similarities and differences between L1 and L2 English texts

Recent research has shown that academic writing is not ‘author-evacuated’ but, rather, carries a representation of the writers’ identity. One way through which writers project their identity in academic writing is stance-taking toward propositions advanced in the text. Appropriate stance-taking has proved to be challenging for novice writers of Research Articles (RAs), especially those writing ...

متن کامل

Heuristic Syllabification and Statistical Syllable-Based Modeling for Speech-Input Topic Identification

We describe a heuristic syllabification method and the use of a statistical syllable n-gram language model for discriminating between a closed set of topics. The syllabification method works by assigning costs to consonant clusters and then splitting the clusters where the cost is minimized. We apply the syllabification on a pronunciation dictionary which maps words to phone sequences; the resu...

متن کامل

Automatic Syllabification for Manipuri language

Development of hand crafted rule for syllabifying words of a language is an expensive task. This paper proposes several data-driven methods for automatic syllabification of words written in Manipuri language. Manipuri is one of the scheduled Indian languages. First, we propose a language-independent rule-based approach formulated using entropy based phonotactic segmentation. Second, we project ...

متن کامل

Unsupervised Learning of a Chinese Spontaneous and Colloquial Speech Lexicon with Content and Filler Phrase Classification

There is significant lexical difference—words and usage of words-between spontaneous/colloquial language and the written language. This difference affects the performance of spoken language recognition systems that use statistical language models or context-free-grammars because these models are based on the written language rather than the spoken form. There are many filler phrases and colloqu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010